PREVIEW 


How Safe Is 
Safe Enough? 


Measuring and Predicting 
Autonomous Vehicle Safety 


Philip Koopman, Ph.D. 


Carnegie Mellon University 


PREVIEW 


For Cindy, Moira, Brynn, and Ben. 


First Edition, 2022. (Version 1.0.00) 


Copyright © 2022 by Philip Koopman 

All rights reserved. 

ISBN: 9798846251243 Trade Paperback 
ISBN: 9798848273397 Hardcover 


No part of this publication may be reproduced or transmitted in any form or 
by any means, including photocopy, scanning, recording, or any information 
storage and retrieval system, without permission in writing from the 
copyright holder. 


INFORMATION IN THIS BOOK IS PROVIDED “4S IS” AND ANY 
EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO 
EVENT SHALL THE AUTHOR OR PUBLISHER BE LIABLE FOR ANY 
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, 
CONSEQUENTIAL, OR OTHER DAMAGES, EVEN IF ADVISED OF 
THE POSSIBILITY OF SUCH DAMAGES. THE INFORMATION IN 
THIS BOOK IS INTENDED TO ONLY PARTIALLY SUPPORT SAFETY 
PRACTICES, AND MORE IS REQUIRED TO ACHIEVE ACCEPTABLE 
SAFETY. YOU ARE RESPONSIBLE FOR THE SAFETY OF THE 
SYSTEMS YOU DESIGN AND OPERATE REGARDLESS OF THE 
CONTENT OF THIS BOOK. THE AUTHOR IS NOT A LAWYER AND 
NOTHING IN THIS BOOK SHOULD BE CONSIDERED AS LEGAL 
ADVICE. BY OPENING THIS BOOK, THE READER AGREES TO 
THESE TERMS AND ACCEPTS SOLE RESPONSIBILITY FOR ANY 
AND ALL DIRECT AND INDIRECT USES OF ITS CONTENTS. 


Contents PREVIEW iii 


Contents 
IPHOLACE  sscstesssecctsidensdesssedascesdecseusdesassscdsesteccssunsseedsesonsesssascstassstassaeceusisdccsclése xi 
De.  UMCrOMUCH OM sisssccssesseciassseiesseunsetoeessssessensnesnnssspsesscebsessansnnecssoossesseeesesenses 1 
We Vee SCOPE vevsees Cass ctonsstesusewead aces eactevisteta cecsaniveeatvasdacesanct Gbsatesesecteaaaaunnerens 1 
1.2, A whirlwind tour vescesi.cicctcestceeavessadcagesaceeacdssuvedeeseuntassectoetbaseuniyeieaveacdsenas 2 
1.3. Terminology and abbreviations. ...........ccceeccesceeceeeeeeceeeeeeeecnaeeaeeeeeaeeas 6 
2. Terminology and challenges...............sccccccccsssscsscsssecccccssccessscersseseeees 11 
2.1. SAE J3016 key terms... eccccccceessecseeeseeessecsecssecesecnseeseeseeeseeeseneeees 11 
2.2. SAE Levels and safety ........cccccccccssecssecsseesecseceseceeceeeseeseeeseeeseneeaes 14 
2.2.1. A simplified version of J3016 Levels ..0.....:ecceceeseeeeceteeseeeeeeeeees 14 
2.2.2. A nuanced description of J3016 Levels ...........ceeceeeseeeseeeteeereeees 15 
2.2.3. Clearing up SAE J3016 misconceptions..........:.cccceeceesseeeseeeteeees 19 
2.2.4. J3016 Levels augmented for safety... eceececeeseeeeceteeteeeeeeeeens 22 
2.2.4.1. Safe Level 2 features need effective driver monitoring........ 22 
2.2.4.2. Safe Level 3 features also need effective driver monitoring 23 
2.2.4.3. Safe Level 3 features via restricted ODD............ cc esseesseeeees 25 
2.2.5. Public road testing .........ccccscecsseeseescecsecsecsseceseeeseceseeseeeseeeseneeses 25 
2.3. Vehicle automation MOdES .........:ccccecsceesecsecseceseceseceeeseeseeeseeeeseeeses 26 
2.3.1. The four operational MOdES ..........eeecceseeeeeeeeeeceeeeeeeeceeeeaeeeeeeaeeas 28 
2.3.1.1. Driver Assistance.......eececcsesseeeeceseeeeeeceseceeeeeesecnaeeaeeereeaeeas 28 
2.3.1.2. Supervised Automation .........ccceccccescseceeseeeeeeeseeeeeeseesseeesaes 29 
2.3.1.3. Autonomous Operation .......:.cccccescceesceeeceeeeeeeeeeseeseeeeesseeesaes 30 
2 DiliAe. VehiClS LEStinG .sssscdecescvscetsessveesasdadesehvsaecvuassvieestwerascaunedeeuanss 31 
23s 1.5 Dirt ver Wabi yccvsss psacaakt ccaavsed canes vat dead vaaceana ves ccdan deren assceddeees 32 
2.3.1.6. Other safety considerations ...........cesceeccesececeeeeeeeceeeeeeeeeeaeens 33 
2.3.2. Active safety SYStOMs........ccccscecsseessecstecseceeceseceseeeeeseeeseeeseneeses 34 
2.3.3. Which mode is safest? .........ccccccsseessecstecsneceeceeceecseeeseeeseeeeseeeaes 35 
2.4. Key Safety Challenges ve ciscscccistevicestadsctecaaseescescvexsderheviccetacestseans terceaaed 35 
2.4.1. Sense-Plan-Act and MOTe........eceeeescesecseeeeeeseceeeeeeeeeceaeeaeeereeaeeas 36 
2.4.2. Sensor limitations .........cccecsceesseesseescecseecseeceseeeseeeaeeeeeeseeeseeeeeneeens 39 
2.4.3. Sensor fusion and classification challenges ............c::ccsceeseeeeees 40 
2.4.4. World model limitations 00.0.0... ccc eceseeeeeeeeeteceeeeeeeeeceaeeaeeereeaeeaeees 41 
2.4.5. Prediction challenges ..........c:ccccecccecseessesseeeseeessecsecsaeceeeeeeneeenes 44 
2.4.6. Planning challenges ..........ccccecccessceesseesseesceeseeessecstecssecnseeneeeaeeees 46 
247s UNCertaint yn. aecsics caseavsnce see vacdeaahtved Gives niviats cee AaeisddaieseGuew aes 46 
2.4.8. Motion challenges ..........cccecccecscesseesseeeseeeeeeseecseecseecsaeceaeeeeesaeenes 47 
2.4.9, Reaction challenges .........:.ccccesccesseeeseeeeeeseeeseeeseeeseeesecseeeeeeaeens 47 
2.4.10. Monitoring and dependability challenges «0.0.0.0... eeeeeeeeeeees 48 
2.4.11. Lifecycle challenges for sustained operation...........ceeeeeeeee 49 
2.4.12. Unknown unknowns 2000.0... :eccceceeceeeeceeeseeeeeeseceeeeeeeceaeeaeeereeaeeas 50 
2.4.12.1. Unknowns are not addressed by the ODD...............cseeeeee 50 
2.4.12.2. ODD model deficiencies ......... cc ceeceeseeseeereeeeeeteeeneeeseeesaes 51 
2.4.12.3. Knowing when the AV does not Know........ccccssesseesteeeees 52 


2.5. CONCIUSION.L 0.0... eeeeeeeeeeeeeseesseseseseseseseessessseseseseseseseseseseseseseseseseeeseeees 52 


3. 


4. 


PREVIEW 


Koopman 

Risk acceptance fraMmeworkS............ccsccccscsscscscscecscccsscesessersscceeseees 55 
3.1. Survey of risk acceptance frameWOFkS ..........cccecsceesseesteesteceteeeteeeseens 35 
Ua N eg PaUAR P AAR ca veestconrtaciecbaeecbbaosiectee daopces oontdalatsenveccebagendiered 56 
Do TINA iiss incia seed enlangalieakaia ess bustin Medias aaicaeceadieNdaN 57 
SG sc MEE IN sc caerulea tc ht Bice cascian ions lcea Nac alae Ca 58 
UA, MGR oh op en ceipst cae ceeresepceenes Cod toupee coment monet eenee 59 
Bis Lag pI ASI carats neces are vacbcs denen naa leases navnabeieate 59 
3.1.6. Risk table approaches ...........ceecceseeeceeeeeseceeeeeeeeceeeeaeeeeeeaetaeeeneeae 60 
UD RAPE ace aees es vce vreaivs ace te apcs ces bnda se ceewerscsuenewacenen edvecsenndasecees 61 
3.1.6.2. SIL approaches.........cccecccesccsssceeeceeseeeeeeeseesseeeeecsaecsaeenaeenaeens 62 

3.2. GAMAB as the default AV risk frameWork..........cccceseeseeeteeeteeeteees 63 
Dude DUMIMALY 5. cevedaeiesenaidvaceviwssuuetesduddchasneedscese vacdehenscdevah deteccbadsedeastvsdbenesede 64 
What people mean by “safe” ............sscsscssscssscesscsssscssscssssssssssseees 65 
4.1. Safer than a human river .........ccccccscssecesecesecseeceeeeeeeeeseeeseeeeeeeeeeaaes 66 
4.1.1. Fewer crashes than human-driven vehicles............c:cccceseesee 66 
4.1.1.1. Comparable driving Conditions ...........:ccccssesseceteeeteeeteeeeees 66 
4.1.1.2. Crash severity distribution... eececeeseeeeceeeeeeeeeeeeeneeeneeaee 67 
4.1.1.3. Passive safety SySteMs ..0......cccecsseesseeseeseesseessecssecneeneenseens 69 
4.1.1.4. Comparable active safety Systems ..........:cccceseeseceteeeteeeeees 70 
4.1.2. Positive Risk Balance: better than human ..............cccecseesseeeeees 71 
4.1.2.1. Which human driver is the baseline? «0.0.00... c:ceseeseeeteeeeeees 71 
4.1.2.2. Victim demographics .........cccccccsseeseesseeseecseeeseceaeceeeneenaeens 72 
4.1.3. Enhanced personal safety ..........cccceecceseeseeeeceeeeeeeeeeseceeeeaeeeeenaeeas 73 
4.2. Behaves safely on the road .......ccccescessceseceseceeeceeeceeeeeeeeeeseeseeeseeesaes 74 
4.2.1. Following traffic laWS 2.0... ceeeeceeseeeececeeeeeecesecaeeeeeeseceeeeaeeeeaeeas 74 
4.2.2. Good roadmanship ...........ccceecceeseesseeesceeseeeeeceaeceseeeseeeeeeseeeeeneeees 76 
4.2.3. Provably safe driving behavior...........cccccesseceseceteceseeeeeeeeeeeeneeees 77 
4.2.4. Does better at avoiding crashes ..........:cccccessceseeeseceseceeeeeeeeeeeeeees 78 

A 3, ROAC TOSUIING 3%, cecsdansviecssvnscdesats see dnehandeasevacestaeueadeay’ ede ceans canveastvaceeasees 79 
4.3.1. Millions of miles ........cccecceccceeseeeseeeseeeseeeseeceeeceseeseseeeeseeeeeneesns 79 
4.3.1.1. Millions of miles of road testing .........ceccssesseceseceteeeteeeteees 79 
4.3.1.2. Representative miles 00.0.0... ccccsceeseeseesecssecsecsseceseeeeenseens 81 
4.3.1.3. Which version of the System? ..........cccccssecssecstecsteceteeeteenseees 82 
4.3.2. Miles between disengagement ...........:cccccsscesecetsceseceeeeeeeeeeeeeees 83 
AA, SUMUMAati On eccsiosseveeiss chiens svaeentescaueideseuecunssqusstsesccteves auecdesachoutedsesacveends 85 
4.4.1. Billions of simulation miles.............ccccssesseceteeeseceseeeeeeeeeeeeeeeees 86 
A.A.1.1. Stimulation SCOpC........cecccesccsesceeeeeeeeeeeeeeeeseecseecssecsseeeaeenseees 86 
4.4.1.2. Simulation ability to predict real-world safety..............006 87 
4.4.1.3. Long-tail simulation events..........cccceecceeseeseeeseeeseeeseeesseessees 88 
4.4.2. Shadow mode testing..........cccccccscecscecssecssecseceeceeecseeeseeesseeseeeesaes 89 
4.4.2.1. Shadow mode challenges ...........ceccecseeeseseneeeteeeteeeseeeseeetees 89 
4.4.2.2. Shadow mode ODD coverage........ccceccceesceeseeteeeteeeseeseessees 90 

45. Satety standards <.cs.ss.06c.c1sgeisieviccasnesdecaaaes.ceassorosecaaduaniaasaerasaassontzatee 91 
4.5.1. We use concepts from safety standards ............ccccessceeseeteeteeeees 93 


4.5.2. We use proprietary safety approaches ...........ccceeceeeseeeseeeteeeteeeees 94 


Contents PREVIEW 7 


4.5.3. We use a Safety CaSC oo... cecccsecssecstecsteceseceseceseeseeeeeeeeseeseeeenseeaes 95 
4.5.4. We conform to safety standards..........ccccccesccessceeseeeeeeeeeeseeeeeeesees 96 
4.5.5. We conform to FMVSS ..0....eeecesesseeesceeeseeeeceaeeaeeeeeeaeceeeneeaeenee 98 
4.5.6. We got five stars on crash tests ..........c:ccesccescessceeeeeeeeeeeeeseeesseeenes 99 
4.6. Risk management and insurance COVELAGE..........ecceeseeceeeteeeteeeseeesees 100 
4.6.1. We do risk manageMent...........:ccccescesecessceeeeeeeeeseeeeeeeseeeseeesaes 100 
4.6.2. The cost of risk forces us to be safe... eeeeceseeseeeceteeneeeeeeees 102 
4.6.3. The insurance company gave US a POLICY.........ceccceesseeseeseeees 106 
4.6.3.1. Insurance policies do not make you safe ...........ceseeseeeeees 106 
4.6.3.2. Low premiums do not necessarily mean low harm............ 107 
4.6.3.3. Insurance premiums will not force acceptable safety......... 108 

AT SALCLY CUNLUNC «se ccgeet ceciea Piven cudebinddexel ccedts cierdeesanbaccensbiadedvedberareacendess 109 
4.7.1. Trust us, we are smart and work hard .........cccc cc ccceeeesesseceeeeeees 109 
4.7.2. Safety 1s our #1 PriOrity 0... ce ccsceseceseceseceeeceeeeeseeeseeeeseeeseeeaaes 110 
4.7.3. We have a robust safety Culture ...........cecceecceeseeeteeeseeeteeeseeeaees 110 
4.7.4. We have excellent system engineering ............ccccceesceeseeseeees 112 
4.8. Human drivers are Dad).s.iscccncsascesccnsssrscsensvinsaiesssechaess ecesnsenceamaseteeaaay 113 
4.8.1. Human drivers are terrible, so AVs will be safe... 113 
4.8.2. AVs won't drive drunk ue cesceseceseceseceeeeseeeeseeesseesseeeseeeaaes 115 
4.9. An AV safety hierarchy of needs ...........cccecccessceesceeeeeeeeeeeeeeeeeseeesaes 116 
4.10. Misleading AV industry talking points regarding safety ............... 118 
4.10.1. 94% human CrrOr oo... eee esceeesecssecseceseceseeeeeeseeeeeeesseeeeseeseeesaes 119 
4.10.2. Regulation vs. Inmovation ..........:ccsccescceseceeeceeeeeeeeeeeeeseeeseeeaees 121 
4.10.3. We already have regulations ..........:ccccceeccessceesceseeeteeeteeeseeeaaes 123 
4.10.4. Recalls and lawsuits mitigate risk... eeeesesseeseeteeeseeesees 124 
A105. Standards??? iss; sucsscvasevsevaisvicunoess ceases; sidsbteavesnstsnierchaeedeanpvieense 126 
4.10.6. A patchwork quilt of regulations ...........ceccceesceeseeeteeeteeeteeenees 128 
4.10.7. The spirit of the standard .0..... eee eceeeeceeseeeeceeeeeeeeceeeareeeeeaes 130 
4.10.8. Government regulator Skills.........cccecceeccessceeeeeeeeteeeseeeseeeaees 131 
4.10.9. Revealing the secret autonOMy SAUCE ..........ceeceeseeeseeeeeeeeeeeees 132 
4.10.10. Delaying AVS is murder 0.2... eececeeseeeeceeeeeeeeeeeseceeeneeaeenee 133 
AVON INO Ceaths 00 VOt cscs sccssicenieieatsialectsesstecasssatieiaedadesareeteceesiaeds 135 
4.10.12. Fear of Missing Ut... ccc ccseesseesteceeceeceseeeeeeeeeeseeseneeaes 135 
4.10.13. Testing deaths are a regrettable necessity ...........ccceseeeeeeee 136 
4.10.14. Self-certification ...0... i eieececceseeceseceeeeseeeeceaeeeeeeceseceeeneeaeenee 136 
AAT SUM ALY. ah sienakicect tales eeahes sauct cage ee Oeesaaute case Bibs sideennd ed eeetaaeec ce oeegoeeathe 138 
5. Setting an acceptable safety goal ..............sccscsssscsssccsesssesssessssessseoees 139 
5.1. Positive Risk Balance .......eeccceeccesesseeeeceseceeeeeeesecneeeaeeeeceaeeaeeeeeeaeeas 139 
5.2. Baseline human-driven vehicle safety...........ccccsscssecssecsteceteceseeeeens 141 
5.2.1. Baseline US road fatality and injury rates 200.0... eeeeeeeeeees 141 
5.2.2. Risk variation due to operational conditions............:.cscceseeeees 143 
5.2.3. Risk variation for different types of road USeTS...........::cccceeee 145 
5.2.4. Risk variation due to driver ExXPeTieNCe ....... ce ceseesseeeteeeteeeteees 146 
5.3. How positive should Positive Risk Balance be? .........ceeeeeeteeeteees 148 


5.3.1. Consumer attitudes and trust... cccecssssecccccessesesssseeeeesens 148 


PREVIEW 


vi Koopman 
5.3.1.1. Consumers want far better than minimal PRB................... 148 
5.3.1.2. The dread risk effect 1.0.0... ccceeccceceeeceeeceeeseeeseeeteeeseeeseessees 149 
5.3.1.3. Over-trust and loss Of trust........cccceccceeceeseeteeeseeeseeeseetsees 150 

5.3.2. The RAND report and 10% safer oo... ccccsseesseesteceteeeteeeseees 152 
5.3.3. The case for 10 to 100 times safer than human drivers............ 153 
5.4. Summary: an acceptable PRB baseline «00.0.0... cccceseeeteceteeeteeeeeeees 155 
6. Measuring Safety.............cccccsccsscssscssscssscesssssssssscsscsssssssssssessseeees 157 
6.1. Leading vs. lagging Metrics... eececceesceeseeeseeessecssecsseceseeneesaeeees 157 
6.2. Vehicle-level Metrics ........cccccesccessceesceeeeeeceeeseeeseeeseeceecsuecsaeeneenseenes 159 
6.2.1. Crashes and other loss events .........cccccsscsseesteesteesteceteceteenseenes 159 
6.2.2. Measuring road Miles ........ecceesceecceseceesceeseeeeeeeecssecesecnseenseeees 160 
6.2.3. Disengagement Metrics ..........cccceesceesceeseeeseeessecseecsuecnaeeeeenseenes 162 
6.2.4. Road testing safety MetriCs.........ccceccceesceeseesteeseestecseceteeseeees 164 
6.2.5. Driving test MEtiCS ......:ceececcceeceseeeeeceeseeeseeeeeeeecsaeceseenseenseeees 167 
6.2.6. COVETAGE MEHTICS 0.0... cecceseceseceesceeeeeeeeeeeeeeseeescecsuecaeceaecneenseenes 169 
6.2.7. Physics-based risk Metrics..........cccccsceeseesseeseeseesseceeesteeeeeees 170 
6.2.7.1. Criticality Metrics... ccccecceesceeeeeeeeseeeeesseeeseesseeeseessees 170 
6.2.7.2. Newtonian pPhySics ..........cecceecceeceecceseeeseeeeseesseeeeeseeeseeeaees 171 
6.2.7.3. Assumptions and measurement uncertainty........ cece 172 
6.2.7.4. Best effort when safety cannot be proven... 174 
6.2.7.5. Permissiveness VS. Safety .........cccccccecscessceeseeeseeeseeeseeeseessees 175 

6.3. ENGiNeeriNg MEtrICS ....... ccc ceecesccesecessceseeeseeeeeeeesceeseecaecsueceeeeeesaeeees 176 
6.3.1... Planing MetHICS escsaisccsasssseccese tua ceassssaccaveesigcsidasoveassssaddaseednceases 176 
6.3.1.1. Avoiding collisions and near hits with buffers .................. 176 
6.3.1.2. Scenario and ODD Coverage.......eececscecsseesseeteeteeseeseensees 177 
6.3.2. Perception MEtriCs .........cecceeccessceeeeeeececseeeseeececseecsseceaeenseenseenes 178 
6.3.2.1. Classification ACCULACY ........cceecceesceeteeeeeeseeeteeeeeeteeesseesees 178 
6.3.2.2. Sensor fusion and common cause failures ...........::ccseeees 180 
6.3:2,3 , EAC CASES vis cccssvesieacased jeaaaves sees saedasuaas tdskan va sestnaaht esta seaesnaaes 182 
6.3.3. Prediction MetriCs........ccceeccesscessceeeeeeeceeseeeseecseeceeessecsseeeeesseenes 182 
6.3.3.1. Keep-Out ZONE 00... eee ececcceesceceeececeneceeseeceeeeeceaeeesaeeseaeeeneeeess 183 
6.3.3.2. Motion extrapolation ........ccccceccccesesseesceeeseeeseeeseeeseeeseessees 183 
6.3.3.3. Prediction motion Changes ..........c:ccccesceesseeseeseetteesteeseees 184 
6.3.3.4. Object permanence... ceecceecceescesseeeeeeeeeeeeeseeeeesseeeseeessees 187 
6:34. ODD Meth OS. cscs csacadcacsasadsnccasavaaecaaaseaecinniaadeauad eancassaeselaanaaa sean: 188 
6.3.5. SUIPPTISE MOUS osc. seciieass seccaness cdceneticeedenseeties Ueevecenscaendeertdiaetess 190 
6.3.5.1. Software reliability growth modeling ............ cc eeeeeeeeeees 191 
6.3.5.2. Surprise arrival rate MOdeliINg....... eee eseeseeseesteetteeneees 192 
6.3.6. Conformance and engineering rigor Met©iCs ..........:eeeeeeeeeeees 193 
6.3.7. Causal rather than correlative Metrics ..........ccceseeseceteeeteeeteeees 196 
6:4. Metrig proposal .s.cts dsssases consetascesta os ceusedeaecsaneescuaseassceasarasscaastedeceases 198 
6.4.1. Safety Performance Indicators (SPIs) .......:ccccessesseeeteeeseeeeeeees 198 
6.4.2.-A VSC Metres vcscssts cdessivcegucavedesss tcsgesasteaavedndceesetertasecanduasaseuseases 200 
6.5. Metric thresholds .........cccccesccesscesscessceeeeeeseeeseeesceeseecseecssecsseeneeseenes 201 


Contents PREVIEW 
6.5.2. Extreme values matter more than average metrics ...............6 
6.5.3. Metric thresholds ..........cccecccecscessceeseeesceeeeceecaecseecsaeceaeeneeneenes 

OO. /SUMUMAPY i 5icchsdsshints sauscded iadelon chacecdsd ielaael ba anteasbnceeeal Maeelbdaed diatadeaadss 


Te, MALCLY CASES scsciccccsssvssssossssceonssnsstoonsceccocssonuasensosvosessccnussseassovessscvassoesasd 


7.1, Safety Cases-and SPS). :..cscsecesadesccssecssccauesasccadedesccanatesceasssvaeceassasicense 
7.1.1. Safety case structure and notation...........ccceeeseesecseeeeeeeeeeeeeeees 
7.1.2. Safety cases and leading indicators... ec eeeeseesecseeeeeteeneeenees 
7.1.3. SPIs used in safety CaS€S........ccecccecscecsceeseeeseeeseeeseeesecseeneeeseens 
7.1.4. Safety Performance Indicators revisited ...........:eceeseeseeeeeeeeeees 

7.1.4.1. Importance of SPI threshold values... .eeeeeceeseeeeneeees 
7.1.4.2. SPIs as statistical MOMNItOTS «0.0... eee eeeeeseeeeceeeneeeeeeeeneeenees 
7.1.4.3. SPI vs. runtime safety MOMItOriNg «0.0.0.0... ee eeeeeeeeeteeeeeeees 
7.1.4.4. What does an SPI violation actually mean? ............ eee 

7.2. Real-world safety cases are not deductive ........::ccsceseeseeeteceteeeteeees 
7.2.1. Defeasible reasoning and defeaters...........cccesessecesecetecetseeteeees 
7.2.2. SPIs and defeasible safety Cases .........cccccecsseesseesteeeteeeteeeteeeteens 

Tide SUMIUM ANY. x ccscehdeacecsass seuactaesieosgnaceussadeatiesianes wistasveedcamesnaate teem estes 


8. Applying SPIs in practice.............ccccccsccsescsecsssssscssessssssesssessssesoees 


8.1. Creating a metric target for safety... ec ceeeceeseceeeeeeeeceeeeseeeeenseees 
8.1.1. Lagging incident Metrics...........cccccccseceteceteceneceeeeeeeeeeeeeeeeenneeees 
8.1.2. Baseline operational Conditions ..........eceeeecceceeteeeeceeeeeeeeeeaeens 
8.1.3. Baseline victim demographic profile ............cceccceeseeseeteeeeeees 
8.1.4. Overall baseline Metrics... ee eeeecesececeeseeeeceeeeseeeeceaeeaeeereeaeeas 
8.1.5. Confusing performance metrics with safety metrics............0.... 

8.2. Identifying leading SPIs 0.00... ccecsceesseesteceteceeceeceeeeeeeeeeeseeeeeeeeses 
8.2.1. System-level leading SPIS..........ccccccesseceteceteceeeeeseeeeeeeeeeeeeeeees 

8.2.1.1. Inherently risky vehicle behavior .............eeeesceeeeeeeeeeeeeeees 
8.2.1.2. Loss of confidence in vehicle design...........cccesseesseesteeeees 
8.2.2. Component-level leading SPIS............ccccscceseesseeeeeeeeeeeeeeseeeees 
8.2.3. Leading SPIs for process .........cceceescesecsceeeeeeeceeeeeeeeeceeeeaeeeeeeaeeas 
8.2.4. Leading SPIs for operations..........::cccceseceteeeteceseeeseeeeeeeeeeeeseeees 
8.2.5. System-level leading KPIS ...........ccccccssceseceseceseeeeeceeeeeeeeeeeeeaes 

8.3. Monitoring SPIS 2.0... ec eccecceeseeeneeeseeeseecseecaecsseceseeeseeneeeseeeseeeeeeeeees 
8.3.1. Sources of SPI data 20... cee ececccsseeeessececeeeceeeceseeceeceeeeaeeeeeeaeeas 
83:2; LOW 10. MeCaSUte aiseesstiiSs eurcoieisniesindntveeicedeentetleeemtenels 

8.3.2.1. Temporal CONSIStENCY .........ceeccescessceeseeeeeeseeeeeeeseeeeseeeeeesees 
8.3.2.2. Cross-fUnction CONSIStENCY........:.cecceecceeecesteeeeeeteeeteeetseenees 
8.3.2.3. Sanity CHECKS 3: cssss ssaesucessiccsuadstuescassaceunacouaceticonsesuseeseuessaavace 
8.3.2.4. Safety envelopes c:..ccscccecscesssscescessacavecesseceuscsdasecesstcbsnaescaes 
8.3.2.5. Latent redundancy failures ............ccceecceesceesseeseeeteeeteeeseeesaes 
8.3520 COVETARC vavessdeaaveisdsalvevanstavanedoaasesideaauveadcaaeveascanveeee ans ieibaceeas 
8.3.2.7. Failure rates oe eecccceecceceeseeeeceseeseeeecesecseeeeeeseceaeeaeeereeaeeatees 
8.3.2.8. Requirement violations and temporal logic... 
8.3.2.9. Process HYSICNE 0... cecsceesceessecsecseeceseceseeeseceeeeseeeseeeeeneeenes 


PREVIEW 


viii Koopman 
8.3.2.10. Root cause analysis .........cccceccceecceeseeeseeeseeeseeeseeeseeneesaeens 256 
8.3.3. Computing SPI violations...........ccccccecsseesseceseceseceeeeeeeeeeeeeseeeaes 257 
8.3.3.1. On-vehicle discrete events ........cccccceesseesceestecstecsteeeeenaeens 258 
8.3.3.2. Fleet-wide vehicle trends ..........c:eeceseeseeeeceeeeeeereeeeteeeneeaee 259 
8.3.3.3. Simulation and test SPIS 0.0.0... eceesseesseesteceteceeceeeeteeeseens 260 
$3.34. Process trends sscisc.sesscseashiecgsagsvrsontcaatesianada odaasseieamavaccaaeais 261 

Be A PS asst vessne cede cearvccentateosiesiersnctncteadesemneensbetenesee 262 
8.3.3.6. Multiple measurements for an SPI threshold............... 262 
8.3.3.7. Data computation and transMissiOn ..........:ccceseeseeeteeeteees 264 
8.3.3.8. Statistical confidence for online measurements................. 266 
8.3.4. Responding to a leading SPI violation ............cceeceeseeteeeeeeees 268 
8.3.4.1. The meaning of an SPI violation... eeeceseceteceteeeteees 268 
8.3.4.2. SPI violation response plan..........cccccecsseesseeeteeeteceteeeteeeeees 269 
8.3.5. Root cause contributing factors .........:ccccsseceseceseceeeeeeeeeeeeeseeees 270 
8.4. SPIs and bootstrap safety argument ..........0.ecccecceeseeeteeeseeeteeeseeesees 271 
8.4.1. Bootstrapping starting with a pilot deployment ............... 272 
8.4.2. Bootstrapping a safety case with SPIs.........cceeceeeeeseeereeeeeees 277 

Be Ds SSUMMIMNALY 5 ss eveeilccseccecues ssiest'cceaecadva cadets aitcderabedecvceasiesecbacedveonescevisvedersss 282 
9. Deciding when to deploy.............cccccccscssscssscssscsssssssenesessessssssseseees 283 
9.1. An approach to setting deployment criteria ...........eeeeeeceeseeeeeeeeeees 283 
9.2. Addressing Uncertainty ........cccecccecseeseeescecseecseeceseceaeenseeseeseeeeeeeeses 286 
9.2.1. Risk management 00.0... ccccssecssecsseestecseceseceseceeeseeeeeeeeeeseeeeans 287 
9.2.2. Uncertainty estimation via SPIs ..0......eeeseeeeeceteeeeeeeeeteeneeeeees 289 
9.3. Conformance to standards ..0.......:ecceceeceesececeeseeeeceseeaeeeeeeaeceeeneeaeenee 291 
OA Safe: UPCatescc.cccsedc cece ecetevessaveceses cast cctecsgacaiodautessuesiscdescesectaseeesvatelecsss 292 
9.5. Road testing safety... ceccccsccssscesscseeeseseeeseeeeeeseecseecssecsecnseesrenes 294 
9.5.1. Safety driver effectiveness .........cccceccceesseesseesteestecseceeceteeneeees 295 
9.5.2. In-vehicle safety rivers .......c.cccceccceesceesseeseeesseeseeseceecsneeseeees 297 
9.5.3. Road testing with a telepresent safety operator...........e eee 299 
9.5.3.1. Role of remote safety operator... ceeccesceesseeseetteesteessees 300 
9.5.3.2. Active vs. passive remote MOMNIOTING ......... cee eeeeeeeteeeees 300 
9.5.4. Road testing with a Big Red Button... ce eeseeseceteeeseeeseeees 301 
9.5.5. Road testing with a COMdUCHOM .......eeceesceeseeseeseesteceeeeeenseeees 303 
9°50; Beta testing: 2.cesx ecnavesadeaiveracessavsaccassasdectanverceagaaasdeanavandveasteasscayes 304 
9.5.7. Uncrewed road testing ........cccceccccesceesceeseeeeeeseecseecssecsaeenseenaeenes 305 

9 Of SUMIMALY sede deed ts iacervene ceunshelieevensedeunsLebdeecadd cebevsutessoteeseeavschcbiuncbeiaetens 307 
10. Ethical AV deployment ................cccssccscsscesceecsssccccccsescesssseessccseseeees 309 
10.1. The Trolley Problem .........ccceccccccsescesseeeseeeeeeeesseecsaecssecsaeeneeseenes 309 
10.1.1. Problems with the Trolley Problem............:cceeceeseereeteeees 309 
10.1.2. The solution to the Trolley Problem..............ceeecceeseeeseeeteeees 311 
10.2, Skin thea es: cdsecsecasseviesntevdaasvieseveioriaemateateaian acne 312 
10.2.1. The Moral Crumple Zone .........ccccccccesceseceseceseceseceeeeeseeeseeeees 314 
10.2.2. Moral Crumple Zones as a design strategy ...........cccceseereees 315 


10.2.3. Insurance as ethical insulation. ........... eee cecceccececccceeeceeeeeeees 317 


PREVIEW 


Contents ix 
10.3. Deployment governance ..........ccceccceescessseeseeeteeeeeeeecseecssecsseeneeeaeens 319 
10.3.1. Thought experiment: do you deploy under pressure?............. 319 
10.3.2. Real-world AV deployment governance...........c:ccscesceseeeteees 321 
10.4. Other ethical issues for AVS........ccccccecsceeseeseeessecsseessecsecsseeneesaeens 323 
10.4.1. Occupant vs. pedestrian safety..........cceccsseesseestecsteceteceteeeeees 323 
10.4.2. Risk transfer and demographic group .........::ccsccsseceseeeteeeteees 324 
10.4.3. Bending the rules and breaking the law........ececeeseeeeeeteeeees 326 
10.4.3.1. Ambiguities and doing the right thing... eee 326 
10.4.3.2. The Tesla FSD rolling stop recall... ccc ceeeeeseeeteeereeees 327 
10.4.4. Passenger overrides and urgent CTeSS.........c:cccesccsteceteceteeeteees 330 
10.4.5. Foreseeable misuse and abuse ..........:ccscecsceessecstecsteceeeeeeeseens 332 
10.4.6. Fatalities vs. time to market 0.0.0... ccecccesseesseestecsteceeeneeeaeens 334 
10.4.7. Blaming the Computer ..........ccccecccecseeseeesseesseessecseenseeneeeaeens 336 
10.4.8. The blame game... .ecceccccecceeseessceeseeeeeeseeceecseecsaecsaeeneenaeens 338 
10.4.9. Harm now, benefits later.......c ccc ccecesescecccecesssssseseeseeeeeeees 339 
10.4.10. Transportation system interactions ..........ceeeseeseereeeeeeeeeeees 339 
10.5. The effect of AV company business model ...........cccceseeseeseeeteees 340 
10.6. Ethical regulatory approaches .....0.....cccccsceeseeseeeeeesseestecsseeneenaeens 342 
TAD MS RU acct ccs dare seats teceea tae ie ego el escapee tReet 343 
10.6.2. Responsibility for loss (compensation)...........::scceceeseeeeeeees 343 
10.6.3. Transparency vei sezcecsvesciecastade sets vsvacaahs ededeates sicaissicecsseaviuiaiiedaceae 344 
V0:6:4. INCISION ss sssdecavecadasstcessavasccies saa ccasageideadsterseagaaasactan etegcansaea decays 345 
10.6.5. Non-Discrimination .........cccccceccceesceeseeeeeeeeceecseecnsecesecnseeeaeens 346 
WQS TAS UMMA. aiiat tote recectteecectsleneae destad cist dea che cestavalandesien fesatstuneostecectetiaiees 346 
11. CONCIUSIONS............cccccsccssscsssccssscsscsssssssssesenessnsssnsssesesesescssssssesssessoees 349 
Wiles WIAD SUP, ca. sccs5ecestacecataceadtcestsacets oe santeasioeesansabecnest¥onechaase onde Masesaatiess 349 
11,2., RESOULCES 03: assesses sttbhochdeiebscauicedlcdusieesvieds deedces hoses eas daseeeeruies teers 349 
11.2.1. “Safe enough” resources ........ccceccceeseeseeeseeesseessecssecnecneeeaeens 349 
11.2.2. Educational resources ..........ccccesccesscessceeeeeeeeeeceesseeeseeeseesaeenaees 350 
11.2.3. Other reSOULCeS 0... eeeeecceeeseeeeceseeseeeecesecaeeeeceaeceeeeeeaeenaeeaeeeees 351 


11.3. About the author... ccccccescscccccsssssssssecesccseeesssssesseeseseeesenaees 352 


Preface 


The promise of autonomous vehicles 


The promise of fully autonomous cars that drive themselves has beckoned 
for decades. It is typical for an American to spend an hour a day in a car, 
with much of that time spent in a relatively unpleasant commute to work 
rather than enjoying the idyllic lure of an open road adventure that is so 
baked into the culture. Wouldn’t it be nice if we could watch a movie, take a 
nap, or otherwise relax instead of jockeying for position with all the other 
commuters? Or maybe we could go to sleep in our garage and wake up in 
another city, our personal luxury transportation pod having let us sleep away 
the boring hours of an all-night drive to the next business meeting. 

Other potential applications for autonomous vehicle (AV) technology 
abound. They include a potential major restructuring of long-haul trucking, 
parcel delivery, public transportation, and in particular, dramatically 
increased access to transportation for those who cannot drive. There are 
tradeoffs involved, and it remains to be seen how things will play out. But 
with tens of billions of dollars pouring into investments in the technology, 
expectations are set high. 

A salient AV promise is dramatically improving road safety. Indeed, the 
lead selling point has come to be that deploying the technology is urgent 
because every year it is delayed means more people die on our roadways. 

However, the topic of safety is far more complicated than the facile 
talking points usually involved, such as “computers won’t drive drunk so of 
course they will be safer than human drivers.”' On the other hand, it is 
unreasonable to expect AVs to be perfectly safe. Rather, AVs should be 
acceptably safe, achieving some balance between the benefits they provide 
and the risk they impose on society. 

A common notion is that AVs will be safe enough if they are better than 
human drivers. While intuitively appealing, that simple criterion is unlikely 
to work in practice. First, “safer than a human driver” is much more 
complicated than it might seem if you need to address which driver, 
operating where, and under what conditions. Second, other considerations 
need to be addressed such as how much redistribution of risk is permissible. 
Is it OK to kill twice as many pedestrians if the total fatalities including 
passengers decreases? And third, the technology is so immature that 
predicting the safety of an AV before it is deployed is a major challenge. 

In reality, nobody yet knows if AVs will be as safe as human drivers when 
deployed at scale. We hope that will be the case, but it might not even be 
possible with our current technical abilities beyond a small number of benign 
environments with highly constrained capabilities. Or we might just be 
another few billion dollars of investment and a year away from self-driving 


' This particular fallacy is debunked in section 4.8.2. 


xii PREVIEW Koopman 


utopia.” Regardless, we would like to do better than deploy with insufficient 
evidence that the technology is safe and just see how it turns out. 

Being able to know that an AV is safe enough before we have experience 
with at-scale deployment is no easy problem. But if we cut corners on being 
able to ensure acceptable safety, it seems likely that high-profile crashes are 
inevitable. A pattern of such crashes — or even one especially horrific event — 
might cause society to reject the technology, setting back progress a decade 
or more.* 

AVs have been the technology of the future for decades. Maybe this will 
be the time the technology really deploys at scale. Certainly ’'d welcome 
improved road safety and the ability to relax instead of having to concentrate 
on drives I find boring. But this technology will not be viable if the public 
does not find it acceptably safe. To get there we need to understand not only 
what acceptably safe means, but also how we might measure whether we are 
there or not. That is the scope of this book. 

An important scope disclaimer is in order. This book does not address the 
significant challenges involved in taking a machine learning-based 
technology and making it safe. That topic is crucially important, but is an 
entirely different area that is still in flux. So this is not about telling anyone 
how to design a safe AV. Rather it is about how to structure a way to 
evaluate whether the designers have actually achieved their goal of being 
safe enough. 


Why should I listen to this guy? 


I started working on AV safety in the mid-1990s as a member of the 
Carnegie Mellon University Navlab team as part of the Automated Highway 
Systems (AHS) project run by US DOT Federal Highways.* That was years 
before the DARPA grand challenges. The work culminated in a 1997 demo 
on a closed highway in San Diego.° At the AHS demo, Carnegie Mellon 
demonstrated camera-based lane following technology on not only cars, but 
also a pair of city buses. Berkeley PATH demonstrated platooned cars guided 
by magnets embedded in the roadway. A number of other organizations also 
produced useful technology and engineering analysis® but in the end there 
was no planned path forward, and the idea went dormant in the public eye for 
almost a decade before DARPA picked up the topic. 


? Unlikely to happen. More likely, it will be many years and many more tens of 
billions of dollars before this technology can deploy at scale. 

3 While different in many ways, the history of the nuclear power industry is an 
important cautionary tale for what happens when high-profile loss events occur after 
society has been assured that a technology is safe. 

4 See this AHS status report by Lay et al. from 1996: 
https://rosap.ntL.bts.gov/view/dot/38381 

> See Thorpe, Jochem & Pomerleau, 1997: 
https://www.ri.cmu.edu/pub_files/pub2/thorpe_charles_1997_1/thorpe_charles_1997 


Lpdf 
® See Bishop, Dopart & Shladover 1997: 


https://path. berkeley.edu/sites/default/files/demo97foravs17v6.pdf 


I was not part of the DARPA Grand Challenges, but I was involved in 
other ground robotics safety and robustness via work with a team at the 
National Robotics Engineering Center (NREC) at Carnegie Mellon 
University.’ NREC and its parent Robotics Institute have produced many of 
the key players in the AV industry today. However, I’m not a “robo-grad” as 
they are called. Rather, I have worked at the engineering school with a 
concentration on dependability and safety. During the initial AHS project and 
later the decade or so I spent working with NREC, I learned about 
autonomous vehicles and spent a lot of time thinking about safety. 

I also have considerable experience with non-autonomous embedded 
system software design practices and safety in a number of other industries. 
I’ve had research funding, industry experience, and hundreds of consulting 
engagements covering conventional automotive, railway, chemical process, 
aviation, factory automation, building automation, vertical transportation, 
electrical power, consumer goods, combat systems, chip design, and even 
medical applications. I have also dealt with safety standards across those 
fields. Additionally, I have up-close and personal lived experience with 
applied safety practices from my time as a US Navy submarine officer, 
where so many things must be done perfectly if you and your shipmates want 
to avoid having a very bad day. Finally, I have seen the inside of a courtroom 
and other legal processes while working as an expert witness.* 

More recently I have become involved in safety standards specific to AVs 
and processes that are creating AV regulations. Sensing a reluctance of the 
industry to commit to AV-specific standards, I spearheaded an effort to 
create ANSI/UL 4600, which is aimed at ensuring that an AV is acceptably 
safe.’ As I write this, ANSI/UL 4600 is well on its way to being updated to a 
third edition to fully encompass not only light vehicles, but also heavy 
trucks.!° I am also active on several other industry standards committees that 
deal with conventional and autonomous vehicle safety. 

After those experiences and more, I feel that I have as much visibility and 
insight into the problems with AV safety as anyone can have in such a fast- 
moving and secretive world. I hope that this book makes it easier for others 
to understand the various challenges and potential solutions I’ve seen along 
the way. 


Audience 


This book is intended to be useful for a wide range of stakeholders who 
are interested in ensuring that AV technology is deployed safely. That 
includes engineers, regulators, legislative technical staff, government affairs 


7 See: https://www.nrec.ri.cmu.edu/ 

8 For a one-hour lecture on what I learned in one such case, see: 
https://youtu.be/DKHa7rxkvK8 

° See Koopman et al. 2019: 


https://users.ece.cmu.edu/~koopman/pubs/Koopmanl9_ WAISE UL4600.pdf 
‘0 For a simple starting page for ANSI/UL 4600 information, see: 


https://users.ece.cmu.edu/~koopman/ul4600/index.html 


xiv PREVIEW Koopman 


staff, insurers, technical journalists, students, mobility experts, and 
technology enthusiasts. Rather than attempting to write for a single uniform 
audience, each section is written in a way that is as accessible as I can 
manage while still not holding back on detail relevant to deeper 
understanding of the core issues for specialists. 

Depending on the reader’s background some sections will be more 
accessible than others. I’ve provided summaries for second and some third- 
level headings to follow the main flow for those who might find some 
sections less relevant to their needs. If you find your eyes glazing over at 
some point, feel free to skip ahead to the next summary paragraph to get a 
change of topic. If you are relatively new to the area of AV safety in general, 
you might want to start with my free video short course on AV safety to get 
up to speed before diving into the details in this book.!! 


Book Organization 


The chapters of the book are organized as follows: 


e Chapter 1 provides a light introduction and whirlwind tour of the 
material in the book. 


e Chapter 2 goes over terminology, vehicle automation modes, and key 
safety challenges that need to be addressed to be able to say an AV is 
acceptably safe. 

e Chapter 3 covers risk acceptance frameworks. It turns out there is more 
than one way to frame the question of what risk might be acceptable. 

e Chapter 4 covers what people mean by “safe.” This chapter is a result of 
having been in too many discussions where people were talking past 
each other meaning completely different things by the word “safe.” !? It 
also includes a list of misleading industry-promoted talking points that 
are harmful to productive discussion about acceptable safety. 

e Chapter 5 discusses how to set an acceptably safe goal, including setting 
a comparative safety baseline and accounting for things beyond simply 
total number of fatalities involved in crashes. 

e Chapter 6 discusses how to measure and predict safety in more detail. It 
is not enough to count up the losses after the fact. There needs to be a 
way to build confidence before deploying. 

e Chapter 7 covers safety cases and how Safety Performance Indicators 
(SPIs) can be integrated into safety cases to provide safety metrics 
supporting a “safe enough” decision-making process. I believe the 
concept of SPIs as presented will be a key to deploying AVs safely at 
scale. 


'l Short course lectures are hosted with open access both on YouTube and 


Archive.org, including both video lectures and slides: 
https://users.ece.cmu.edu/~koopman/lectures/index.html#av 

" Too often, I myself have been a participant in the talking-past exercise. This 
chapter is in part a reflection to help me get better at not doing that. More 
importantly, if we don’t even know what we mean by “safe” we cannot have an 
intelligent discussion about “how safe.” 


Preface PREVIEW 


e Chapter 8 deals with how to identify, monitor, and respond to SPI-based 
metrics. Coming up with an actionable and measurable safety case is the 
hard part. This book frames the situation, but is not a deep dive into the 
nuances of safety case construction itself. 

e Chapter 9 (finally!) outlines how to decide when it is ethically 
responsible to deploy an AV despite inevitable uncertainty. It also covers 
how to ensure acceptably safe road testing, which is more difficult to do 
safely than simply putting a driver in the vehicle and telling them not to 
crash. 

e Chapter 10 discusses some ethical issues relevant to AV safety that will 
need to be addressed before the technology can be deployed at scale, 
including regulatory considerations. Spoiler: the infamous Trolley 
Problem is not what we should be spending time talking about. 

e Chapter 11 wraps up, presenting pointers to resources readers might find 
helpful. 


Writing Practicalities 


This book is more of a discussion and not an academic review paper. It is 
light on references not directly relevant to the discussion, and even has a bit 
of snark to lighten things up at times.'? You will not find an exhaustive 
literature survey here, but rather mentions of things that have caught my eye 
as being especially relevant. There is a bit of redundancy across some 
sections because some topics interact with multiple other topics. I’ve tried to 
cross-reference and shorten overlapping discussions, but there is no perfect 
solution for this. If you think I really missed the boat on something let me 
know. '* 

Part of the informality is that many references are to information on the 
Web, with an emphasis on finding as many open access sources as possible 
rather than paywalled material. Rather than spend time on tedious (and often 
elusive) formal citations, I’ve added URLs. If a URL goes stale, readers are 
encouraged to look up the history of the URL via the Wayback Machine at: 
https://archive.org/ to recover the relevant content. All URLs listed have 
been checked as being active as of July 2022. 

Some footnotes refer to Wikipedia and other non-authoritative sources. 
These references are made because the particular material cited seems like a 
reasonable starting point for those who want to understand more about a 
topic. They are not meant as a definitive justification for a point being made. 
Readers are cautioned that material on Wikipedia might not always be 
accurate. 

I’ve used footnotes to try to avoid derailing the main discussion with 
parentheticals and to provide references to resources that are freely 


'3 Safety of life-critical systems is in fact serious business. But education without at 
least a little humor is ineffective, even if the topic is serious. 
'4 We also use the “royal we” in subsequent chapters. It is really just me talking. 


xvi PREVIEW Koopman 


accessible to the maximum degree practical. Rather than typing in all the 
URLs you can find a web page with clickable links here: 
https://users.ece.cmu.edu/~koopman/SafeEnough/ 

Some of the contents of this book have appeared in various forms in blog 
entries, web pages, and the like. However, the new material is substantial, 
and even existing material has been edited or even rewritten for this book. 
This book is definitely not just a rehash of blog posts — not by a long shot. 

Finally, examples use English and Metric units more or less at random in 
an attempt to appeal to users of both systems.!° Discussions of regulatory 
matters emphasize what is happening in the US. Regulatory challenges 
outside the US differ in the details, but those differences are not central to the 
message of this book. 


Acknowledgments 


While the book has been written recently, the path has been long and 
winding. I thank the following with special recognition for their contributions 
both direct and indirect on this path: Michael Barr, Michelle Bayouth, Ensar 
Becic, Sagar Behere, Jen Black, Simon Burton, Missy Cummings, Rami 
Debouk, Wes Doonan, Jackie Erickson, Uma Ferrell, Frank Fratrik, Tom 
Fuhrman, Mallory Graydon, Glen Haydon, Mahmood Hikmet, Daniel 
Hinkle, Yoav Hollander, Michael Holloway, Casidhe Hutchison, Rolf 
Johansson, Aaron Kane, Tim Kelly, John Knight, Katina Michael, Joe Miller, 
Beth Osyk, Brendon Ouimette, Fred Perkins, Jens Pollmer, Deborah Prince, 
Justin Ray, Paula Ranallo, Heather Sakellariou, Steve Shladover, Dan 
Siewiorek, Don Slavik, Zhongxin Sun, Chuck Thorpe, Kim Wasson, Jack 
Weast, Chuck Weinstock, William Widen, Marilyn Wolf, Junko Yoshida, 
David Zipper, Membership of IFIP WG 10.4, and Contributors to ANSI/UL 
4600. 


Everyone can improve, and I’m no exception. If you see something in this 
book that you disagree with, something exceptionally relevant I did not cover 
or, worse, an outright mistake, please let me know via an e-mail to: 
AVSafety@Koopman.us 


Philip Koopman 
Pittsburgh, PA, September 2022. 


'S That is my story, and I’m sticking to it. Consider it trying to be fair to readers of 
both systems. 


Introduction PREVIEW 1 


1. Introduction 


Just make sure the autonomous vehicle is at least as safe as a human 
driver. Really, how hard can it be to figure that out? 


The fact that you are reading an entire book on the topic of “safe enough” 
should be a hint that this problem gets more complex the deeper you go. Find 
a comfy reading place and let’s dig in. 

This book attempts to find answers to a question that seems simple on its 
face: how will we know when Autonomous Vehicles (AVs) are safe enough 
to deploy? 

To answer that question we touch upon terminology, why AV safety is 
difficult, the nature of safety vs. risk, what “enough” might mean for safety, 
dealing with uncertainty, metrics, decision criteria, regulations, and 
accompanying ethical issues. While we go through those topics, do not forget 
that the point of all this is to be able to answer the question: “Is this AV safe 
enough to deploy on public roads?” We seek answers based on knowledge 
and engineering rigor rather than hope, faith, bluster or willful ignorance. 


1.1. Scope 


The scope of this book is a discussion of how to determine that an AV is 
safe enough to deploy. While there is an overview of many of the technical 
challenges to creating an AV, the emphasis is more about how to measure 
whether the result is acceptably safe. That includes defining a risk framework 
as well as metrics that can both predict safety and provide a traceable path to 
the AV design and validation. 

Later chapters have a fair amount of detail on the use of Safety 
Performance Indicators (SPIs) as they are conceived in the ANSI/UL 4600 
safety standard.'® While there might be other ways to accurately predict AV 
safety before deployment, an SPI-based approach is the way we think will 
work best. 

Topics in scope tend to be in the areas of safety engineering, metrics, 
ethics, and regulatory approaches. This book encompasses topics relevant to 
company leadership, regulators, and other stakeholders having a way to 
know that any decision to deploy an AV or test it on public roads is being 
made in a responsible manner after considering relevant factors. 

Topics out of scope for the book are details regarding machine learning 
validation, software safety, and technical details of suitable arguments that 
might be put into a safety case. Much of that can be found in ANSI/UL 4600, 
but it is not our emphasis here. What ANSI/UL 4600 puts out of scope is a 


16 See materials at: https://users.ece.cmu.edu/~koopman/ul4600/index.html 


2 PREVIEW Koopman 


framework for deciding how safe is acceptably safe. That is what we do in 
this book. 


1.2. A whirlwind tour 


Here is a whirlwind tour of the contents of the book. Buckle your seat 
belts! 


Chapter 2: 


To understand how safe we need an autonomous vehicle (AV) to be, we 
first need to understand what we actually mean by “autonomous.” We take 
that to mean a situation in which there is no natural person immediately 
responsible for safety regarding the “‘self-driving” part of the vehicle. While 
it is traditional to use the infamous SAE levels in this discussion, we believe 
the Levels hurt more than help for discussions regarding safety with the 
general public and regulators. We propose an alternate categorization of: 
driver assistance, supervised automation, autonomous operation, and vehicle 
testing. Those categories revolve around the role of the driver rather than the 
technical approach used to implement automation. 

Make sure that you know that an ODD (Operational Design Domain) is 
the set of conditions for which an AV is designed to operate, and that articles 
describing the SAE Levels typically get Level 3 wrong in some way that is 
relevant to safety. 

Autonomous vehicles have key safety challenges at every stage of what is 
often called an “autonomy pipeline” that runs from sensors through 
computations to vehicle outputs. Traditional safety-critical software 
approaches make convenient assumptions such as the external world is 
perfectly understood and there is one uniquely correct response to every 
stimulus. Moreover, software safety typically assumes someone can look at 
the software and determine if it is in fact correct. None of that really works 
out for the perception and “AI” parts of AV technology. And there is the 
matter of unknown unknowns — things we do not realize we do not know that 
might nonetheless cause a fatality while driving. It’s a can of worms. '’ 


Chapter 3: 


A variety of different risk acceptance frameworks might be used. 
Frameworks vary by whether acceptable risk is some value relative to natural 
phenomena or is in comparison to some alternative system. Do you want to 
compare to the risk of death by lightning? Or the risk of death from a human- 
driven car instead of an AV? While comparing to the risk of human-driven 
cars is a popular starting point, it is only a starting point. 


' If it were easy everyone would already have an AV in their garage. We are not 
even close to that now. 


Introduction PREVIEW 3 


Chapter 4: 


Whenever someone says an AV is “safe” you might be astounded by the 
range of things that can mean. Indeed, if you ask someone what they mean 
by safe you might get a tangled up set of answers instead of a clean 
definition. We break the possible meanings of “safe” down into categories 
including: comparison to human driver, good roadmanship, lots of testing, 
lots of simulation, followed safety standards, is insured, is a product of a 
company that says safety is #1, and cannot be as bad as a human driver 
because it uses a computer instead. 

The thing is, maybe safety is most of those definitions all at once. We 
propose a hierarchy of safety needs to organize all the definitions.'’ We also 
tear into more than a dozen myths, talking points, and outright propaganda 
themes that tend to be used to confuse the topic of what safety might be and 
why we should believe AVs will improve safety on public roads. 


Chapter 5: 


The current default for “safe enough” in most discussions is at least as safe 
as a human driver. Understanding what that means requires knowing what 
kinds of harm we are comparing (fatality, injuries), on what types of roads, in 
which states, in which operational conditions, and for which drivers. By the 
way, 60-something year old drivers are the safest. '° 

The difficult part is that just being exactly as good as a human driver will 
not be enough because of consumer attitudes and the tendency of AV crashes 
to get more press. A simplistic utilitarian argument is that if an AV is just 
10% safer overall than human drivers it should be deployed because it will 
save lives. However, stakeholders expect computers to be much better than 
human drivers, and engineering margin needs to be included to handle 
inevitable uncertainty about safety predictions. In reality, AVs might need to 
have a predicted safety of 10 to 100 times better than humans when initially 
deployed to be viable. Anything less risks a loss of trust and backlash when 
the crashes start happening in the real world. 


Chapter 6: 


You cannot measure safety without putting numbers on things. Lagging 
metrics tend to measure outcomes, whereas leading metrics try to predict 
how safety will turn out later. However, the leading vs. lagging thing is 
relative to other metrics on a spectrum rather than a clear-cut distinction. 

There are all sorts of metrics that might be useful, including measuring 
different stages of autonomy pipeline performance, engineering rigor, and 
even road miles. Disengagements are probably not that helpful in predicting 


'8 Maslow comes into play. Who said freshman psychology was a waste of time? 

'? Bet younger readers did not see that one coming! For older readers keep this in 
mind the next time someone tries to age-shame you on social media about AV safety. 
It has certainly come in handy for us. 


4 PREVIEW Koopman 


safety.”° A good leading metric needs to be predictive of safety rather than 
just correlative or it is prone to being gamed. We like ANSI/UL 4600-style 
Safety Performance Indicators (SPIs) because they are linked to a safety case 
(see next chapter). 

Metrics need a pass/fail threshold criterion or they cannot be used to 
answer the “safe enough” question. The numbers we need are not “more is 
better” but rather “is this number good enough to meet the safety goal?” The 
average probably is not what matters for most metrics. Safety is not about the 
99,999,999 miles where there was no fatal crash — it is about the 1 mile 
where there was a fatal crash. That means safety metrics need to be 
especially good at measuring and predicting very infrequent but high 
consequence events. Safety is about other harms too, but fatalities tend to be 
the headline issue. 


Chapter 7: 


Safety cases provide a structured argument based on evidence that a 
particular claim for safety is true. Once we have defined what we mean by 
“safe enough” we should build a safety case to convince stakeholders that a 
claim of “safe enough” is true based on a reasonable argument backed up by 
evidence. A Safety Performance Indicator (SPI) is a metric that is directly 
attached to a claim that can monitor if the claim is falsified (disproven). If 
your claim is that you never get too close to a pedestrian, the SPI metric 
looks at how often that happens,*! and raises an alarm if the claim is 
invalidated too often.” 

Another issue with safety cases in the real world is that the safety case will 
have omissions, because it is impossible to guarantee you have fully 
analyzed an open world operational environment. There is always some 
safety issue that you have not thought of, or that will not even exist until after 
you deploy. This means safety cases are only somewhat about 
mathematically deductive proof, because they need to grapple with the 


20 Disengagements might have seemed like a good idea at the time, and kudos to 
California for trying to promote data transparency for AV testing. But it’s time to 
move on to something that reflects testing safety outcomes. On the other, hand crash 
descriptions are proving a lot more useful as an impetus for safety transparency. 

See: https://www.dmv.ca.gov/portal/vehicle-industry-services/autonomous- 
vehicles/autonomous-vehicle-collision-reports/ 

21 Remember the part where safety is about very low probability events? The 
threshold might be once every million hours. But most thresholds will not be zero 
because no system is perfect. Safety does not quite require perfection, but it can get 
extremely close to that, so we need SPI thresholds that can cope with very low 
probabilities. 

22 Wait — if a claim is only a little false is that OK? The difference between a 
mathematically pure safety case and the real world is right here staring you in the 
face. A claim that is almost never false can be good enough. Whether the “almost 
never” is built into the claim or into the metric associated claim is a design choice. 
This is a slippery point, so we spend time on this in the chapter. 


Introduction PREVIEW 5 


inevitability of unknown unknowns. More important are robustly supported 
claims that are true — as far as we know.”* 


Chapter 8: 


Now we get into applying SPIs with numerous examples of the types of 
things that might be measured. We revisit a baseline safety metric target with 
more detail, and warn that SPIs need to directly match up with claims in the 
safety case to be useful. Any metric that does not directly trace to a claim in 
the safety case might be interesting, but is of questionable prediction validity. 

Leading SPIs can be defined at the system level (did the vehicle do 
something it is not supposed to — even if no crash resulted?), components (is 
your camera struggling to see things it should be able to see?), process (is 
your development team skipping required reviews, analysis, etc.?), and 
operations (are you skipping required maintenance’). 

Monitoring SPIs can be a bit tricky. Even in an unsafe AV, safety SPIs 
will have violation budgets so low that many vehicles will never violate the 
SPI. This means you will need to aggregate data across vehicles to check SPI 
violation rates. Also, an SPI violation does not (necessarily) mean a vehicle 
is about to crash. Rather, it means your safety case has a defect. You need to 
fix that, but it is a much more indirect safety warning compared to “you are 
about to crash” type metrics. 

It seems that everyone wants to “bootstrap” safety by doing a little testing, 
having no crashes, and then using that to argue they can operate safely even 
if the total amount of testing is inadequate for a confident safety prediction. 
The math is seductive, but the math does not answer the question that really 
matters. In practice, a typical bootstrap approach amounts to getting lucky 
rather than being safe. We give an alternative approach based on measuring 
SPI failure rates and safe failure fractions rather than bootstrapping based on 
lack of crashes. 


Chapter 9: 


Finally we get to figure out how to make a deployment decision. Beyond 
net risk (on average “safer than human’) are the issues of risk distribution 
equity, whether best practices for other aspects of safety have been 
considered including safety standards, and how uncertainty regarding 
expected safety is being handled. Moreover, software updates need to be 
done in a way that does not undermine safety or security. 

A special aspect of ensuring overall safety is being sure that road tests are 
safe. This is significantly different than deployment safety, because safe 
outcomes for public road testing are all about the human safety driver rather 
than autonomy computers. 


3 If you are getting concerned about how we can say that an incomplete safety case 
is good enough, look up “defeasible reasoning.” If that helped, then great. If not, 
don’t worry — we’ll try to do better than Wikipedia when we get to that chapter. 


6 PREVIEW Koopman 


Chapter 10: 


Having discussed how to consider a deployment decision, we revisit 
themes from the book in the context of ethical concerns involved with AV 
safety, with an emphasis on the practical. The infamous Trolley Problem is 
not what you should be worrying about. Rather, the biggest issue is how 
people who have huge financial and professional incentives to deploy will 
handle a deployment safety decision — given that they are not going to be the 
ones in the vehicles when the crashes occur. 

A laundry list of other ethical concerns must be addressed for any practical 
AV system to be deployed at scale. Many of them are not talked about often, 
but they will cause practical problems if not addressed. 

Finally, we present a set of principles for ethical regulatory approaches 
that address safety, compensation, transparency, inclusion, and non- 
discrimination. Legislators in most US states are running roughshod over 
those concerns, but one can hope that will improve over time. 


Chapter 11: 


This final chapter describes other materials that might be useful, including 
free online videos we have recorded on topics relevant to this book. 


1.3. Terminology and abbreviations 


Here is a list of abbreviations and key terms used in the book for 
reference. More precise definitions for some terms are provided in the 
chapters. These are quick reference definitions to remind the reader of the 
essential part of the definition. 


Abbreviations and terms 


e Acceptable safety — A system is acceptably safe if it has a very small 
probability of substantive harm, follows best practices for safety 
engineering, and presents a risk vs. benefit tradeoff that accounts for all 
stakeholders who might suffer direct or indirect harm. 

e ADS — Automated Driving System. The computer system that drives an 
autonomous vehicle. 

e AEB - Automatic Emergency Braking. A computer-based function that 
automatically applies brakes to mitigate an impending collision. 

e AHS — Automated Highway Systems. An AV technology demo project 
from the 1990s. 


Introduction PREVIEW 7 


e AI — Artificial Intelligence. This is so messy we’re not going to even 
attempt a definition.** When you hear “AI” in the context of AVs usually 
the term machine learning should have been used instead. 

e ALARP - As Low As Reasonably Practicable. A risk framework 
approach requiring reduction of risk to the degree practicable. For our 
purposes this is reasonably equivalent to ALARA (As Low As 
Reasonably Achievable) and SFAIRP (So Far As Is Reasonably 
Practicable). 

e ALKS —- Automated Lane Keeping System, UNECE #157. A standard 
for implementing a traffic jam pilot automation feature.” 

e ANSI/UL 4600 — A safety standard to ensure that an AV safety case has 
considered everything it should.” 

e ASIL — Automotive Safety Integrity Level. An automotive-specific 
variant of the concept of a SIL. 

e AUHD -— Average Unimpaired Human Driver. A potential baseline 
reference for driving safety. 

e Autonowashing — Overstating the autonomy capability of a vehicle or 
technological approach. 

e AV — Autonomous Vehicle. A vehicle operating without a requirement 
for continuous human safety supervision. 

e BRB - Big Red Button. An emergency stop button or the like to trigger 
an urgent, but hopefully safe, shutdown of an automated system. 

e DDT — Dynamic Driving Task. Normal driving, whether done by a 
person or a machine. 

e Defeasible reasoning — an approach to argument that is rationally 
compelling but potentially falsifiable due to incomplete information. 

e DMV - Department of Motor Vehicles. A state organization that licenses 
drivers and manages vehicle registrations. 

e DOT — Department of Transportation. US state and federal organizations 
that regulate transportation safety. 

e Fallback — Reacting to a vehicle failure, such as pulling to the side of the 
road. 

e  Falsified — a claim that was thought to be true has been proven false by 
observed data. Any claim in a sound safety case is believed to be true, 
but is potentially falsifiable in the face of yet-to-be-encountered 
unknowns. 

e FMVSS — Federal Motor Vehicle Safety Standard(s). US test-based 
standards for specific minimum required safety functionality. 


24 Spoiler: an AV does not “think” like a person, even if we indulge in occasional 
anthropomorphizing descriptions. 

25 See: _https://unece.org/transport/documents/2021/03/standards/un-regulation-no- 
157-automated-lane-keeping-systems-alks 

26 See: https://users.ece.cmu.edu/~koopman/ul4600/index.html 


PREVIEW Koopman 


FN — False Negative: an object is there, but the system fails to detect it. 
FP — False Positive: there is no object, but the system detects an object. 
GAMAB - “Globalement Au Moins Aussi Bon” which describes a risk 
framework in which one thing is overall at least as good as another 
comparable type of system. (See also: PRB) 

Geofence — An ODD limitation to specified locations and/or routes. 
ODDs typically address other operational limitations beyond just 
geofencing. 

GSN — Goal Structuring Notation. A defined notation for safety cases. 
Harm -— Injury or fatality inflicted upon people. See also PDO. 

IIHS — Insurance Institute for Highway Safety. A US nonprofit funded 
by auto insurance companies. 

ISO 21448 — An automotive standard for “safety of the intended 
function” (SOTIF) that encompasses driver assistance features and AVs. 
ISO 26262 — An automotive functional safety standard that applies to 
conventional vehicles as well as AVs. 


KPI — Key Performance Indicator. A metric used to emphasize an 
important aspect of AV performance or some aspect of a company’s 
process performance that might — or might not — be relevant to safety. 
Lagging metrics — Metrics that are gathered regarding safety outcomes 
from operating the AV. 

Leading metrics — Metrics that are gathered to predict safety before loss 
events occur. 

Loss event — An AV incident involving damage to property or harm. 
Used instead of the term “accident.” A typical loss event involves a 
crash, but other types of loss events are possible. 

MEM —- Minimum Endogenous Mortality. A risk framework based on 
determining whether the risk of a system is significantly higher than the 
background exposure to other risks of everyday life. 

ML — Machine Learning. An approach to computation based on using 
training by example to set up a computationally simulated neural 
network. This is a specific technology used by most AVs that is often 
what is being referred to as “AI” (see: artificial intelligence). 

Moral Crumple Zone — The practice of assigning responsibility and 
blame for an automated system failure to some conveniently available 
person, especially if that person could not reasonably have been expected 
to prevent a loss event. See section 10.2.2. 

MRC -— Minimal Risk Condition. Stopping the vehicle after performing 
fallback. There is no actual requirement for the risk to be “minimal” in 
any sense as currently defined, but there is a requirement that it involves 
stopping the vehicle. This term is often used in regulations in a way that 
intends risk of harm while in an MRC to be acceptably low. 


Introduction PREVIEW 9 


NHTSA -— National Highway Traffic Safety Administration. The US 
Department of Transportation administration responsible for vehicle 
safety and managing recalls. 

NSC — National Safety Council. A US nonprofit public service 
organization promoting health and safety. 

OD — Operational Domain. The portion of the real world that the AV 
operates in. The ODD is an approximate model of the OD. 


ODD — Operational Design Domain. The conditions under which an AV 
is intended to operate. This should be an acceptable model of the OD. 
OEDR — Object and Event Detection and Response. Detecting objects 
and other road situations, then changing own vehicle behavior in 
response. Example: steering to avoid collision with an object. 

OEM -— Original Equipment Manufacturer. A company that integrates 
and sells cars. Contrast with automotive suppliers who provide 
components to the OEM. 

PDO — Property Damage Only. A crash severity category in which no 
harm was done to people, but some objects were damaged. 
Permissiveness — How aggressively an AV can move within its ODD 
without exceeding its safety limits. 

PRA — Probabilistic Risk Assessment. Assessing risk as a sum of 
probabilities times consequences. 


PRB — Positive Risk Balance. An AV should be no worse than a human 
driver. 

RSS — Responsibility Sensitive Safety. A strategy for attaining provably 
blame-free AV behavior based on a Newtonian physics approach. 

SAE — The organization formerly known as the Society of Automotive 
Engineers. Now SAE is just short for “SAE International.” 


SAE J3016 — A terminology standard for automated vehicles that is 
commonly mistaken for (but is most definitely NOT) a safety standard. 
SAE J3018 — A standard covering human safety driver aspects of road 
testing safety. 

SAE Levels — A six-level categorization (Levels 0 to 5) designating the 
functionality assigned to automation equipment in a vehicle. The Levels 
are defined in SAE J3016. 

Safety Case — A structured argument, supported by evidence, that 
supports a claim that an AV is acceptably safe to deploy. 

SIL — Safety Integrity Level. Used to determine the engineering rigor to 
be applied to achieve acceptable risk mitigation for a safety-critical 
system or feature. 

SMS — Safety Management System. A system of metrics used to monitor 
for, identify, and correct safety issues. 

SOTIF — Safety Of The Intended Function. Associated with a 
methodology for identifying safety-related performance and requirement 
insufficiencies for driver assistance and AV technology. See ISO 21448. 


10 


PREVIEW Koopman 


SPI — Safety Performance Indicator (pronounced S-P-I rather than 
“spy”). A metric tied to a claim in a safety case and associated with a 
threshold beyond which the claim has been falsified. An SPI violation 
occurs when the SPI’s threshold has been exceeded by the metric value. 
TN — True Negative: there is no object and the system detects no object. 
TP — True Positive: an object is there and is recognized. 

TTC — Time To Collision. A risk metric for how long it would be until a 
collision if vehicles were not maneuvered to avoid that collision. 

VMT — Vehicle Miles Traveled, often in millions (e.g., 100M VMT is 
100 million miles in total traveled by a set of vehicles). 

VSSA — Voluntary Safety Self-Assessment. A report to NHTSA 
submitted by some AV companies disclosing some information relevant 
to plans for safety. 


Numerical conventions: 


“K” — kilo/thousand (1,000), e.g., 100K is 100,000 
kph — kilometers per hour 

“M” — million (1,000,000), e.g., 80M is 80,000,000 
mph — miles per hour 


352 PREVIEW Koopman 


Sadly, it is common for Web resources to go stale. If one of the cited 
references becomes unavailable, try accessing via entering the URL into the 
archive server here: https://archive.org/ 


11.3. About the author 


Prof. Philip Koopman is an internationally recognized expert on Autonomous 
Vehicle (AV) safety whose work in that area spans over 25 years. He is also 
actively involved with AV policy and standards as well as more general 
embedded system design and software quality. His pioneering research work 
includes software robustness testing and run time monitoring of autonomous 
systems to identify how they break and how to fix them. He has extensive 
experience in software safety and software quality across numerous 
transportation, industrial, and defense application domains including 
conventional automotive software and hardware systems. He was the 
principal technical contributor to the UL 4600 standard for autonomous 
system safety issued in 2020. He is a faculty member of the Carnegie Mellon 
University ECE department where he teaches software skills for mission- 
critical systems. In 2018 he was awarded the highly selective IEEE-SSIT 
Carl Barus Award for outstanding service in the public interest for his work 
in promoting automotive computer-based system safety. In 2022 he was 
named to the National Safety Council’s Mobility Safety Advisory Group. 


Web link: https://users.ece.cmu.edu/~koopman/ 


